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Abstract. The measure-theoretic definition of Kullback-Leibler relative-entropy 
(KL-entropy) plays a basic role in the definitions of classical information measures. 
Entropy, mutual information and conditional forms of entropy can be expressed in 
terms of KL-entropy and hence properties of their measure-theoretic analogs will follow 
from those of measure-theoretic KL-entropy. These measure-theoretic definitions are 
key to extending the ergodic theorems of information theory to non-discrete cases. 
A fundamental theorem in this respect is the Gelfand-Yaglom-Perez (GYP) Theorem 
(Pinsker, 1960, Theorem. 2.4.2) which states that measure-theoretic relative-entropy 
equals the supremum of relative-entropies over all measurable partitions. This paper 
states and proves the GYP-theorem for Renyi relative-entropy of order greater than 
one. Consequently, the result can be easily extended to Tsallis relative-entropy. 
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1. Introduction 

Renyi pQ, by replacing linear averaging in Shannon entropy with Kolmogorov-Nagumo 
average or quasilinear mean and further imposing the additivity constraint, proposed 
a one-parameter family of measures of information (a-entropies) which is defined as 
follows: 

where p = {pk}k=i is a probability mass function (pmf) and a G R and a > 0. Renyi 
entropy (0) is a one-parameter generalization of Shannon entropy in the sense that the 
limit a, — > 1 in (jT|) retrieves Shannon entropy. S a is referred as the entropy of order 
a. Despite its formal origin, Renyi entropy proved important in a variety of practical 
applications in coding theory |2j, statistical inference quantum mechanics @], an d 
chaotic dynamical systems 0. 

Along similar lines, Renyi defined a one parameter generalization of Kullback- 
Leibler relative-entropy as pQ 

i n a 

W) = ^TlnEJ^T (2) 

01 1 fc=l T k 

for pmfs p and r. 

On the other hand, though Shannon measure of entropy or information was 
developed essentially for the case when the random variable takes a finite number of 
values, in the literature, one often encounters an extension of Shannon entropy in the 
discrete case to the case of a one-dimensional random variable with density function p 
in the form (e.g [HI El) 

/+oo 
p{x) \np(x) dx . (3) 
-oo 

(jSJ) is known as differential entropy in information theory and Boltzmann H-function 
in Physics. Indeed, during the early stages of development of information theory, the 
important paper by Gelfand, Kolmogorov and Yaglom [S] called attention to the case 
where entropy is defined on an arbitrary measure space (X, 971, /i). In this respect, 
Shannon entropy of a probability density function p : X — > R + can be defined as 

S(p) = ~ plnpd/x , (4) 
Jx 

provided the integral on right exists. One can see from the above definition that the 
concept of "entropy of a pdf" is a misnomer: there is always another measure \i in the 
background. In the discrete case considered by Shannon, \x is the cardinality measure§ 
pp.19]; in the continuous case considered by both Shannon and Wiener, /i is the Lebesgue 
measure cf. pp.54] and pp.61, 62]. All entropies are defined with respect to some 

§ Counting or cardinality measure /i on a measurable space (X, SDt) , when is X is a finite set and 
Wl = 2 x , is defined as fx(E) = #E, VE G Tt. 
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measure /j,, as Shannon and Wiener both emphasized in jHl PP-57, 58] and PP-61, 62] 
respectively. 

This case was studied independently by Kallianpur JXQJ and Pinsker [TT] . and 
perhaps others were guided by the earlier work of Kullback and Leibler |T2j, where 
one would define entropy in terms of Kullback-Leibler relative-entropy. 

In this respect Gelfand-Yaglom-Perez theorem (GYP-theorem) El E2] plays 
an important role, which equips measure-theoretic KL-entropy with a fundamental 
definition. The main contribution of this paper is to state and prove GYP-theorem 
for Renyi relative entropy of order a > 1. 

We review the measure-theoretic formalisms for classical information measures 
in § |2l where we discuss the relation between Shannon entropy and KL-entropy in 
the measure-theoretic case. We extend measure-theoretic definitions to generalized 
information measures in § El Finally, Gelfand-Yaglom-Perez theorem in the general 
case is presented in § HJ 

2. Measure Theoretic Definitions of Classical Information Measures 

Let (X, 9Jt, /i) be a measure space. \i need not be a probability measure unless otherwise 
specified. Symbols P, R will denote probability measures on measurable space (X, 9Jt) 
and p, r denote Oft-measurable functions on X. An OJl-measurable function p : X — > M. + 
is said to be a probability density function (pdf) if f x pdfi = 1. 

In this general setting, entropy S(p) of pdf p defined in (HJ) can be referred to as 
the entropy of the probability measure P, in the sense that the measure P is induced 
by p, i.e., 

P(E) = I p{x) dfi(x) , V£ E Tl . (5) 
Je 

This reference is consistent || because the probability measure P can be identified a.e 
by the pdf p. Further, the definition of the probability measure P in (J3J, allows one to 
write entropy functional as 

since © implies^ P <ti /i, and pdf p is the Radon-Nikodym derivative of P w.r.t /i. 

Now we proceed to the definition of Kullback-Leibler relative-entropy or KL-entropy 
for probability measures. 

|| Say p and r are two pdfs and P and R are corresponding induced measures on measurable space 
(X, 971) such that P and R are identical, i.e., f E pd/i = f E rdfi, \/E 6 971. Then we have p == r and 
hence — J x p In p d/i = — J x r In r dp. 

% If a nonnegative measurable function / induces a measure v on measurable space (X, OJl) with respect 
to a measure p,, defined as v(E) = J E f dfi, VE e Wl then v ^ p. Converse is given by Radon-Nikodym 
theorem ^3 pp.36, Theorem 1.40(b)]. 
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Definition 2.1. Let P and R be two probability measures on measurable space (X, 971). 
Kullback-Leibler relative-entropy of P relative to R is defined as 

[ ln^cLP if P«P, 
I(P\\R) = { Jx dK (7) 

■foo otherwise. 

The divergence inequality I(P\\R) > and I(P\\R) = if and only if P = R can 
be shown in this case too. Relative-entropy (JTj) also can be written as 

Let the cr-finite measure fx on (X, 9Jt) such that P <C R <C fx. Then (JTJ) can be 
written as 

p( x )ln^\dfi(x) , (9) 
r(x) 

provided the integral on right exists. The pdfs p(x) and r(x) in © are the Radon- 
Nikodym derivatives of P and R with respect to fx, i.e., p = and r = ^p. Here in the 
sequel we use the convention 

lnO = -oo, In = +oo forany a G R, O.(±oo) = 0. (10) 

Shannon entropy in © is defined for a probability measure that is induced by a 
pdf. By the Radon- Nikodym theorem, one can define Shannon entropy for any arbitrary 
/x-continuous probability measure as follows. 

Definition 2.2. Let (X, Wl, fx) be a cr-finite measure space. Entropy of any ^-continuous 
probability measure P (P fx) is defined as 

f dP 

S(P) = - / ln^-dP . (11) 
Jx d/i 

Properties of entropy of a probability measure in the Definition 12.21 are studied 
in detail by Ochs |17j . In the literature, one can find notation of the form S(P\lx) 
to represent the entropy functional in viz., the entropy of a probability measure, 
to stress the role of the measure ix (for example f7| El)- Since all the information 
measures we define are with respect to the measure fx on (X, Wl), we omit fx in the 
entropy functional notation. 

By assuming fx as a probability measure in the Definition l2.2l one can relate Shannon 
entropy with Kullback-Leibler entropy as 

S(P) = -I(P\\fx). (12) 

Note that when fx is not a probability measure, the divergence inequality I(P\\fx) > 
need not be satisfied. 

Before we conclude this section, we make a note on the u-finiteness of measure 
fx. In the measure-theoretic definitions of Shannon entropy we assumed that fx is a 
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er-finite measure. This condition was used by Ochs ^7j, Csiszar [F3\ and Rosenblatt- 
Roth [20] to tailor the measure-theoretic definitions. For all practical purposes and for 
most applications this assumption is satisfied. (See [Ej for a discussion on the physical 
interpretation of measurable space (X, 9Jt) with cr-finite measure \i for entropic measure 
of the form (fTT]) . and relaxation cr-finiteness condition.) By relaxing this condition, more 
universal definitions of entropy functionals are studied by Masani [2UI221- 

3. Measure-Theoretic Definitions of Generalized Information Measures 

We begin with a brief note on the notation and assumptions used. We define all the 
information measures on the measurable space (X, 971), and default reference measure 
is /i unless otherwise stated. To avoid clumsy formulations, we will not distinguish 
between functions differing on a //-null set only; nevertheless, we can work with equations 
between 071-measurable functions on X if they are stated as valid as being only //-almost 
everywhere (/i-a.e or a.e). Further we assume that all the quantities of interest exist 
and assume, implicitly, the cr-finiteness of // and //-continuity of probability measures 
when ever required. Since these assumptions repeatedly occur in various definitions 
and formulations, these will not be mentioned in the sequel. With these assumptions 
we do not distinguish between an information measure of pdf p and of corresponding 
probability measure P - hence we give definitions of information measures for pdfs, we 
use corresponding definitions of probability measures as well, when ever it is convenient 
or required - with the understanding that P(E) = j E pdn, the converse being due to 
the Radon-Nikodym theorem, where p = 

Similar to the definition of Shannon entropy (@J) one can extend the Renyi entropy 
in the discrete case $1} to measure-theoretic follows. 

Definition 3.1. Renyi entropy of a pdf p : X — > IR + on (X, 9JT, //) is defined as 



On the other hand, Renyi relative-entropy can be defined as follows. 

Definition 3.2. Let p,r : X — > IR + be two pdfs defined on (X, 971, //). Renyi relative- 
entropy of p relative to r is defined as 




(13) 



provided the integral on the right exists and aeR and a > 0. 



The same can be written for any //-continuous probability measures P as 




(14) 




(15) 



provided integral on the right exists. 
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The same can be written in terms of probability measures as 



I a (P||*) = _lny ^] dP 



dp x ° 



' ^ f <™ (16) 



a — 1 ydfl, 

whenever P <C R; I a (P\\R) = +oo, otherwise. Further if we assume \i in (JT4*j) is a 
probability measure then 

S a (P) = I a {P\\n) . (17) 

On the other hand, it is well known that unlike Shannon entropy, Kullback-Leibler 
relative-entropy in the discrete case can be extended naturally to the measure-theoretic 
case, in the sense that measure-theoretic definitions can be defined as a limit of a 
sequence of finite discrete entropies of pmfs which approximate the pdfs involved. This 
fact is shown for Renyi relative-entropy in the continuous valued space R by Renyi PQ, 
which can be extended to the measure-theoretic case (see [23]). 

4. Gelfand-Yaglom-Perez Theorem in the General Case 

In the ergodic approach of information theory, basic definitions of information measures 
are given for measurable partitions. Before we proceed to the definitions we give our 
notation. Let (X, 971) be a measurable space and II denote the set of all measurable 
partitions of X. We denote a measurable partition n G II as n = {Ek}™ =1 , i.e, 
U™ =1 E k = X and P« fl Ej = 0, i ^ j, i,j = l,...m. We denote the set of all simple 
functions on (X, ffll) by Lq , and the set of all nonnegative Oft-measurable functions by 
L + . The set of all /x-integrable functions, where \x is a measure defined on (X, 97t), is 
denoted by L l {^). Renyi relative-entropy I a (P\\R) refers to which can be written 
as 

I a (P\\R) = ^— In [ ^ a dR , (18) 
a-1 Jx 

where ip G L 1 (R) is defined as <p = 

Let P and R be two probability measures on (X, 3JI) such that P < fl. Relative 
entropy of partition n G II with P with respect to R is defined as 

Now, the GYP-theorem for KL-entropy states that 

/(P||P) = SUp/p|| i? (7T) , (20) 

7ren 

where I(P\\R) measure-theoretic KL-entropy defined as in Definition 12.11 When P is 
not absolutely continuous with respect to R, GYP-theorem assigns I(P\\R) = +oo. The 
proof of GYP-theorem given by Dobrushin ^3] can be found in [TT) pp. 23, Theorem 
2.4.2] or in [21 pp. 92, Lemma 5.2.3]. 



4-1. GYP for Renyi Relative-Entropy 

Before we state and prove the GYP-theorem for Renyi relative-entropy of order a > 1, 
we state the following lemma. 



Lemma 4.1. Let P and R be probability measures on the measurable space (X, 271) such 

dP 
dR' 

P(E) a 



that P <C R. Let p> = 4£. Then for any E G 271 and a > 1 we have 



L v " dR ■ (21) 

Proof. Since P(E) = f E <pdR, WE G 271, by Holder's inequality we have 
r -d/? < ( / r ~" (1./? } ( / (17? ' 

That is 



P{E) a < R{E) a{1 ~« ] [ ip a dR , 

J E 



and hence (Jzlj) follows. Since P -C i?, it is clear that this inequality reduces to = if 
P(P) = 0. □ 

First we present our main result in its special case as follows. 

Lemma 4.2. Let P and R be two probability measures such that P ^ R. Let 
if = ^ G Lq". Then for any < a < oo, we have 

^m-^t^f, (22) 

where {Ek}™ =1 G II is the measurable partition corresponding to (p. 

Proof. The simple function ip G Lq can be written as (p(x) = J2T=i a kXE k {x), Wx G X, 
where a k G R, k = 1, . . . m. Now we have P(E k ) = j E ip dR = OkR(Ek), and hence 

a fc = ^|4, Vfc = l,...m. (23) 

We also have = Y^T=i a tXE k , Wx E X and hence 

/ ^ dP = ^R(Ek) ■ (24) 

Now, from ©, ® and ® one obtains (j22). □ 
Now we state and prove GYP-theorem for Renyi relative-entropy. 
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Theorem 4.3. Let (X, OJt) be a measurable space andU denote the set of all measurable 
partitions of X . Let P and R be two probability measures. Then for any a > 1, we have 

UP\\R)= sup * fc^-Z^, (25) 
if P\\R, otherwise I a (P\\R) = +oo. 

Proof. If P is not absolutely continuous with respect R, Then there exists E G 9Jt such 
that P(E) > and R(E) = 0. Since {E,X - E} G U, I a (P\\R) = +oo. 

Now, we assume that P <C R. It is clear that it is enough to prove that 

*°da= ^ E^?£t' ( 26 ) 
where ip = 4^. From Lemma f4. 11 for any measurable partition {Ef* }™ =1 G IT, we have 

fe =i «(^*J fc =i 

and hence 

f:^vwT^ f ^ dR - ( 27 ) 

Now we shall obtain the reverse inequality to prove (j2*U|) . That is we shall obtain 



E^h^ I * a dR. (28) 
{E k }f =1 en k=l R{E k ) J x 

Note that corresponding to any ip G L + , there exists a sequence of simple functions 
{<p n }, G Lq , which satisfies 

< <pi < <P2 < ■ ■ ■ < <p (29) 

such that lim^oo ip n = ip (see [THl Theorem 1.8(2)]). {<p n } induces a sequence of 
measures {P n } on (X, DJl) defined by 

P n {E)= / (p n (x) dR(x) , VEeWl. (30) 
Je 

We have j E ip n dR < j E p dR < oo,\/E G £DT and hence P n <C R, Vn. From the Lebesgue 
bounded convergence theorem, we have 

lim P n (E) = P(E) , VEeWl . (31) 

71— »OC 

Now, ip n G Lq", y?" < < 1 < n < oo and lim^oo <p% = p a for any a > 0. Hence 
from Lebesgue monotone convergence theorem [2EJ PP-21] we have 



n— »oc 



lim / <di?= / p a di2 . (32) 
'x ix 



The claim is that (|32j) implies 

p a dR = sup <J / dR | < < p a , G L+ } . (33) 



x 



This can be verified as follows. Denote n = tp%. We have < < (p a , Wn, <p n j (p c 
and 



n— >oo 
+ 



lim / n d J R= / </> a di? . (34) 
x Jx 



For any G Lq such that < < ip a we have 



bdR< / ^dfi 
x Jx 



and hence 



supjy 0d#|O<0<(//\0GL+j < Jip a dR 



(35) 



Now we get reverse inequality of (|3"5j). If J x ip a dR < +oo, from (J53)l given any e > 
one can find < n < oo such that 



p Q dfl< / ^dfl + e 
x Jx 



and hence 



(36) 



j ip a dR < sup |y 0d# | < < G L+ j + e . 

Since (J36|) is true for any e > we can write 

y ^ di? < sup |y di? | < < y a , G L+| . (37) 

Now let us verify (|37j) in the case of f x ip a dR = +oo. In this case, ViV > 0, one can 
choose n such that f x no dR > N and hence 



ip a dR > N (•.■ < no < y? a ) (3? 



x 



and 



sup | y dR |O<0<y9 a ,0GlLgj>>iV . (39) 

Since (|38p and ()39p are true for any iV > we have 

y ip a dR = supjy 0di?|O < < y/\0 G L+ j = +oo (40) 

and hence (|3"Tj) is verified in the case of f x (p a dR = +oo. Now and (j3"7j) verifies the 
claim that (|32|) implies (|33|) . Finally (|33|) together with the Lemma f4.2l proves (|26|) and 
hence the theorem. □ 
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4-2. GYP for Tsallis Relative- Entropy 

Due to an increasing interest in long-range correlated systems and non-equilibrium 
phenomena there has recently been much focus on the Tsallis (or nonextensive) 
entropy. Although, first introduced by Havrda and Charvat ,2S| in the context of 
cybernetics theory and later studied by Daroczy [2Z|, it was Tsallis j2B| who exploited 
its nonextensive features and placed it in a physical setting. Tsallis entropy of a pdf p 
defined on (X, 3K, jj) can be defined as, 

Sq{P) = / P{x) ln ? -p-r dfi(x) = , 41 

Jx P{x) q ~ 1 

provided the integral on the right exists and q 6 1, and q > 0. ln q in (j4*Tj) is referred 
to as g-logarithm and is defined as \n q x = x ^ _~ - (x > 0, q & R). Tsallis entropy 
too, like Renyi entropy, is a one-parameter generalization of Shannon entropy in the 
sense that q — > 1 in ()41|) retrieves Shannon entropy. Tsallis entropy can be defined 
for //-continuous probability measure P can be written as 

S q (P) = I ln 9 1 dP . (42) 



x 



d/i / 

In this framework, Tsallis relative-entropy is defined as 



f PW 9 ., An 



/,/HlD = - / p(x)\n q -^d^(x) = "~' W q _ 1 , (43) 



r[x) J X Hxyr- 

' x 

provided all the integrals mentioned above exist and g 6 R, and q > 0. The same can 
be written for two probability measures P and R as 

Wl^) = -^>(jj£) dP , (44) 

whenever P ^ R; I q (P\\R) = +oo, otherwise. If /i in (J42j) is a probability measure then 
we have 

S q (P) = I q (Py) . (45) 

Now, from the fact that Renyi and Tsallis relative-entropies f (J16|) and (J44j) 
respectively) are monotone and continuous functions of each other, the GYP-theorem 
presented in the case of Renyi is valid for the Tsallis case too, whenever q > 1. 



5. Conclusions 



Relative-entropy or KL-entropy is an important concept in information theory, since 
information measures like entropy and mutual information can be formulated as special 
cases. Further, KL-entropy overcomes the shortcomings of entropy in non-discrete 
settings. Note that all the above hold even for generalized information measures. 

GYP-theorem provides a means to compute KL-entropy and studying its 
behavior [21]. In this paper, we presented the measure-theoretic definitions of 
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generalized information measures. We stated and proved the GYP-theorem for 
generalized relative entropies of order a > 1 (q > 1 for the Tsallis case). However, 
results are yet to be achieved for the case < a < 1. 
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